Skip to content

Keep volume policies after migrating it to another primary storage#5067

Merged
nvazquez merged 11 commits intoapache:mainfrom
GutoVeronezi:keep-volume-policies-after-migrating-it-to-another-primary-storage
Sep 8, 2021
Merged

Keep volume policies after migrating it to another primary storage#5067
nvazquez merged 11 commits intoapache:mainfrom
GutoVeronezi:keep-volume-policies-after-migrating-it-to-another-primary-storage

Conversation

@GutoVeronezi
Copy link
Contributor

Description

In ACS, there are two ways to migrate a volume between storages:

  • migrating only the volume (migrateVolume);
  • migrating a VM with volume (migrateVirtualMachineWithVolume);

Both, although are in differents flows, work similarly: they create a new volume in the destination storage and then delete the old volume in the source storage; When we delete the old volume, we also delete the volume policies (snapshot scheduling).

Therefore, the new volume will not have the snapshot schedules and we will lose this information. This PR intends to copy the policies from source to destination volume before deleting the old volume;

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

It has been tested locally in a test lab.

  1. I created a VM;
  2. I configured snapshot schedules in the volume;
  3. I migrated :
    • volume (with migrateVolume);
    • VM and volume (with migrateVirtualMachineWithVolume);
  4. I checked if the policies were copied to new volume;

-- After applying this patch, the policies were copied.
Also, I created Java unit tests for the methods I changed.


@Override
public boolean copyPoliciesBetweenVolumesAndDestroySourceVolumeAfterMigration(Event destinationEvent, Answer destinationEventAnswer, VolumeInfo sourceVolume,
VolumeInfo destinationVolume, boolean retryExpungeVolumeAsync) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GutoVeronezi I think, it is better to separate copy policies and destroy volume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshanaparti I extracted the code that destroy the source volume to a method and added tests to it; However I kept this method, calling both copy policies and destroy volume to facilitate and guide the process.

@GutoVeronezi
Copy link
Contributor Author

Can anyone review this?

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of good improvements in log and exception messages as well as a solid looking re-factor of the volume object and - service. I t will need extensive regression testing on anything that relates to that. I don;t see hypervisor specific risks but these may be hidden in there.
CLGTM

Comment on lines +107 to +111
* @param destinationEvent
* @param destinationEventAnswer
* @param sourceVolume
* @param destinationVolume
* @param retryExpungeVolumeAsync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove these if there is no sensible description to be given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland removed, thanks.

Comment on lines -67 to -70
/**
* Tests the following scenario:
* If the volume gets deleted by another thread (cleanup) and the cleanup is attempted again, the volume isnt found in DB and hence NPE occurs
* during transition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete this javadoc. it seems to add vallue to the method name, which doesn't describe what it does. validateHandleProcessEventCopyCmdAnswerNotPrimaryStoreDoNotSetSize is a bit cryptic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland This whole test class, as well as those in test package in this module were deleted. The tests in this module were already disabled for a long time; therefore we considered that they were irrelevant, as they were a bunch of code that were not being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, it isn't a change but an addition and a deletion, sorry and thanks

@apache apache deleted a comment from blueorangutan Jun 21, 2021
@apache apache deleted a comment from blueorangutan Jun 21, 2021
@apache apache deleted a comment from blueorangutan Jun 21, 2021
@apache apache deleted a comment from blueorangutan Jun 21, 2021
@apache apache deleted a comment from blueorangutan Jun 21, 2021
@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 311

@blueorangutan
Copy link

Trillian Build Failed (tid-1033)

@sureshanaparti sureshanaparti added this to the 4.16.0.0 milestone Jul 16, 2021
@GutoVeronezi
Copy link
Contributor Author

Can anyone review this?

@DaanHoogland
Copy link
Contributor

@GutoVeronezi a trick for getting the right person is running git blame on the files and see if there is the name of an active contributor there to ask directly.

@GutoVeronezi
Copy link
Contributor Author

@GutoVeronezi a trick for getting the right person is running git blame on the files and see if there is the name of an active contributor there to ask directly.

Thanks for the tip @DaanHoogland.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 671

@blueorangutan
Copy link

Trillian test result (tid-1395)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33932 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5067-t1395-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@sureshanaparti does this LGTY? cc @nvazquez

@nvazquez
Copy link
Contributor

Looking good, I think it will need manual testing for migrations and also volumes regression testing, @sureshanaparti is this something you can check please?

@GutoVeronezi
Copy link
Contributor Author

@nvazquez any update about this one?

@nvazquez
Copy link
Contributor

@weizhouapache can you please review/verify this improvement?

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✖️ el8 ✔️ debian ✔️ suse15. SL-JID 1105

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1109

@weizhouapache
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1941)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43228 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5067-t1941-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested ok on kvm.

(1) both root and data disk
(2) offline volume migration (migratevolume)
(3) online volume migration (migratevirtualmachinewithvolume)

@yadvr
Copy link
Member

yadvr commented Sep 3, 2021

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1962)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40370 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5067-t1962-vmware-67u3.zip
Smoke tests completed. 89 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvazquez nvazquez merged commit 8ffba83 into apache:main Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants